产品描述生成是一项具有挑战性且探索不足的任务。大多数这样的工作都采用一组产品属性,因为输入然后在单个通行证中从头开始生成描述。但是,在面对用户在约束描述时的动态愿望时,这种广泛的范式可能会受到限制,例如根据先前版本删除或添加用户指定属性的内容。为了应对这一挑战,我们在描述生成中探索了一种新的草稿编辑方式,从而导致了电子商务中提议的新任务控制文本编辑。更具体地说,我们允许系统从用户接收命令(删除或添加),然后通过基于上一个版本灵活修改内容来生成描述。通过修改以前的版本而不是从头开始,满足新需求更容易,更实用。此外,我们设计了一种数据增强方法,以纠正此任务中的低资源挑战,其中包含一种基于模型的基于规则的策略,以模仿人类的编辑。为了遵循这项新任务,我们介绍了一个人为编写的命令编辑数据集,称为e-cedits和一个新的指标“属性编辑”。我们的实验结果表明,在自动和人类评估中,使用新的数据增强方法在更大程度上优于基准。
translated by 谷歌翻译
低频词预测仍然是现代神经电机翻译(NMT)系统的挑战。最近的自适应培训方法通过强调整体培训目标的重量来促进不频繁词语的产出。尽管召回了低频词的召回,但它们的预测精度意外地受到自适应目标的阻碍。灵感来自观察到低频词形成更紧凑的嵌入空间,我们从代表学习角度解决这一挑战。具体地,我们提出了一种频率感知的令牌级对比度学习方法,其中每个解码步骤的隐藏状态以基于相应的字频率的柔和对比方式从其他目标单词的对应物推开。我们对广泛使用的NIST汉语 - 英语和WMT14英语 - 德语翻译任务进行实验。经验结果表明,我们的提出方法不仅可以显着提高翻译质量,还可以提高词汇分集和优化词表示空间。进一步调查揭示了,与相关的自适应培训策略相比,我们对低频词预测方法的优势在于在不牺牲精度的情况下在不同频率上的令牌级召回的鲁棒性。
translated by 谷歌翻译
生成的型号推理需要机器生成描述日常情景的句子,这是几种概念,最近引起了很多关注。然而,现有模型不能表现和人类,因为它们产生的句子通常是难以置疑和语法的不正确。在本文中,灵感来自人类创造句子的过程,我们提出了一种新颖的知识增强的致辞生成框架,被称为kgr ^ 4,由四个阶段组成:检索,回顾,精炼,重新思考。在此框架下,我们首先执行检索以搜索从外部语料库作为原型的相关句子。然后,我们训练发电机编辑或复制这些原型以生成候选句子,其中基于AutoEncoder的炼油器将修复候选句子。最后,我们从具有不同超参数的生成器产生的候选句子中选择输出句子。对蒙古基准测试的实验结果和深入分析强烈展示了我们框架的有效性。特别是,KGR ^ 4获得官方排行榜中的33.56个香料点,优于前面报告的最佳结果2.49香料点,实现最先进的性能。
translated by 谷歌翻译
互动和非交互式模型是基于向量的交叉信息检索(V-CLIR)中的两个De-Facto标准框架,其分别以同步和异步方式嵌入查询和文档。从检索准确性和计算效率的角度来看,每个型号都有自己的优越性和缺点。在本文中,我们提出了一种新颖的框架来利用这两个范式的优势。具体地,我们介绍了半交互式机制,它在非交互式架构上构建了我们的模型,但将每个文档与其相关的多语言查询一起编码。因此,可以更好地学习交互式模型的交叉特征。此外,我们通过重用其单词嵌入和采用知识蒸馏来进一步将知识从训练有素的互动模型转移到我们的。我们的模型是从多语言预先训练的语言模型M-BERT初始化的,并在从维基百科和从现实世界搜索引擎收集的内部数据集进行评估。广泛的分析表明,我们的方法在保持计算效率的同时显着提高了检索准确性。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译